ChinaXiv.org 中国科学院科技论文预发布平台

按提交时间

2022
3

按主题分类

计算机科学的集成理论
3

按作者

按机构

Chair Informatik 5, RWTH Aachen University, 52056 Aachen, Germany2Fraunhofer Institute for Applied Information Techniques , 53757 Sankt Augustin, Germany3Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, 7500AE Enschede, The Netherlands4Department of Human Genetics, Leiden University Medical Centre, Leiden 2333 ZA, The Netherlands5Institute of Medical Information, Faculty of Medicine & University Hospital Cologne, University of Cologne, 50674 Cologne, Germany
1
Department of Human Genetics, Leiden University Medical Centre, Leiden 2333 ZA, The Netherlands
1
Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, 7500AE Enschede, The Netherlands
1
Fraunhofer Institute for Applied Information Techniques (FIT), 53757 Sankt Augustin, Germany
1
Institute of Medical Information, Faculty of Medicine & University Hospital Cologne, University of Cologne, 50674 Cologne, Germany
1

当前资源共 3条

隐藏摘要

点击量

时间

下载量

您选择的条件: Beyan, Oya

1. ChinaXiv:202211.00211
下载全文

DAMS: A Distributed Analytics Metadata Schema

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-18 合作期刊: 《数据智能（英文）》

Welten, Sascha Neumann, Laurenz Yediel, Yeliz Ucer da Silva Santos, Luiz Olavo Bonino Decker, Stefan Beyan, Oya

摘要： In recent years, implementations enabling Distributed Analytics (DA) have gained considerable attention due to their ability to perform complex analysis tasks on decentralised data by bringing the analysis to the data. These concepts propose privacy-enhancing alternatives to data centralisation approaches, which have restricted applicability in case of sensitive data due to ethical, legal or social aspects. Nevertheless, the immanent problem of DA-enabling architectures is the black-box-alike behaviour of the highly distributed components originating from the lack of semantically enriched descriptions, particularly the absence of basic metadata for data sets or analysis tasks. To approach the mentioned problems, we propose a metadata schema for DA infrastructures, which provides a vocabulary to enrich the involved entities with descriptive semantics. We initially perform a requirement analysis with domain experts to reveal necessary metadata items, which represents the foundation of our schema. Afterwards, we transform the obtained domain expert knowledge into user stories and derive the most significant semantic content. In the final step, we enable machine-readability via RDF(S) and SHACL serialisations. We deploy our schema in a proof-of-concept monitoring dashboard to validate its contribution to the transparency of DA architectures. Additionally, we evaluate the schemas compliance with the FAIR principles. The evaluation shows that the schema succeeds in increasing transparency while being compliant with most of the FAIR principles. Because a common metadata model is critical for enhancing the compatibility between multiple DA infrastructures, our work lowers data access and analysis barriers. It represents an initial and infrastructure-independent foundation for the FAIRification of DA and the underlying scientific data management.

点击量 645 下载量 196 评论
2. ChinaXiv:202211.00179
下载全文

Helping the Consumers and Producers of Standards, Repositories and Policies to Enable FAIR Data

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-16 合作期刊: 《数据智能（英文）》

McQuilton, Peter Batista, Dominique Beyan, Oya Granell, Ramon Coles, Simon Izzo, Massimiliano Lister, Allyson L. Pergl, Robert Rocca-Serra, Philippe Schaap, Ben Shanahan, Hugh Thurston, Milo Sansone, Susanna-Assunta

摘要： Thousands of community-developed (meta)data guidelines, models, ontologies, schemas and formats have been created and implemented by several thousand data repositories and knowledge-bases, across all disciplines. These resources are necessary to meet government, funder and publisher expectations of greater transparency and access to and preservation of data related to research publications. This obligates researchers to ensure their data is FAIR, share their data using the appropriate standards, store their data in sustainable and community-adopted repositories, and to conform to funder and publisher data policies. FAIR data sharing also plays a key role in enabling researchers to evaluate, re-analyse and reproduce each others work. We can map the landscape of relationships between community-adopted standards and repositories, and the journal publisher and funder data policies that recommend their use. In this paper, we show how the work of the GO-FAIR FAIR Standards, Repositories and Policies (StRePo) Implementation Network serves as a central integration and cross-fertilisation point for the reuse of FAIR standards, repositories and data policies in general. Pivotal to this effort, the FAIRsharing, an endorsed flagship resource of the Research Data Alliance that maps the landscape of relationships between community-adopted standards and repositories, and the journal publisher and funder data policies that recommend their use. Lastly, we highlight a number of activities around FAIR tools, services and educational efforts to raise awareness and encourage participation.

点击量 458 下载量 140 评论
3. ChinaXiv:202211.00187
下载全文

Distributed Analytics on Sensitive Medical Data: The Personal Health Train

分类：计算机科学 >> 计算机科学的集成理论提交时间： 2022-11-16 合作期刊: 《数据智能（英文）》

Beyan, Oya Choudhury, Ananya van Soest, Johan Kohlbacher, Oliver Zimmermann, Lukas Stenzhorn, Holger Karim, Md Rezaul Dumontier, Michel Decker, Stefan Santos, Luiz Olavo Bonino da Silva Dekker, Andre

摘要： In recent years, as newer technologies have evolved around the healthcare ecosystem, more and more data have been generated. Advanced analytics could power the data collected from numerous sources, both from healthcare institutions, or generated by individuals themselves via apps and devices, and lead to innovations in treatment and diagnosis of diseases; improve the care given to the patient; and empower citizens to participate in the decision-making process regarding their own health and well-being. However, the sensitive nature of the health data prohibits healthcare organizations from sharing the data. The Personal Health Train (PHT) is a novel approach, aiming to establish a distributed data analytics infrastructure enabling the (re)use of distributed healthcare data, while data owners stay in control of their own data. The main principle of the PHT is that data remain in their original location, and analytical tasks visit data sources and execute the tasks. The PHT provides a distributed, flexible approach to use data in a network of participants, incorporating the FAIR principles. It facilitates the responsible use of sensitive and/or personal data by adopting international principles and regulations. This paper presents the concepts and main components of the PHT and demonstrates how it complies with FAIR principles.

点击量 399 下载量 129 评论

DAMS: A Distributed Analytics Metadata Schema

Helping the Consumers and Producers of Standards, Repositories and Policies to Enable FAIR Data

Distributed Analytics on Sensitive Medical Data: The Personal Health Train